-
Notifications
You must be signed in to change notification settings - Fork 11
Add remote symbolication support with build-id and PC offset #324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add remote symbolication support with build-id and PC offset #324
Conversation
b41a6eb to
74c5410
Compare
Benchmarks [x86_64 wall]Parameters
See matching parameters
SummaryFound 0 performance improvements and 1 performance regressions! Performance is the same for 14 metrics, 23 unstable metrics.
|
Benchmarks [x86_64 cpu,wall,alloc,memleak]Parameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 15 metrics, 23 unstable metrics. |
Benchmarks [x86_64 memleak,alloc]Parameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 16 metrics, 22 unstable metrics. |
Benchmarks [x86_64 alloc]Parameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 14 metrics, 24 unstable metrics. |
Benchmarks [x86_64 memleak]Parameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 15 metrics, 23 unstable metrics. |
Benchmarks [x86_64 cpu]Parameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 15 metrics, 23 unstable metrics. |
Benchmarks [x86_64 cpu,wall]Parameters
See matching parameters
SummaryFound 1 performance improvements and 0 performance regressions! Performance is the same for 14 metrics, 23 unstable metrics.
|
Benchmarks [aarch64 cpu]Parameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 16 metrics, 22 unstable metrics. |
Benchmarks [aarch64 memleak,alloc]Parameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 17 metrics, 21 unstable metrics. |
Benchmarks [aarch64 alloc]Parameters
See matching parameters
SummaryFound 1 performance improvements and 0 performance regressions! Performance is the same for 15 metrics, 22 unstable metrics.
|
Benchmarks [aarch64 wall]Parameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 17 metrics, 21 unstable metrics. |
Benchmarks [aarch64 cpu,wall]Parameters
See matching parameters
SummaryFound 2 performance improvements and 0 performance regressions! Performance is the same for 15 metrics, 21 unstable metrics.
|
Benchmarks [aarch64 cpu,wall,alloc,memleak]Parameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 16 metrics, 22 unstable metrics. |
Benchmarks [aarch64 memleak]Parameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 17 metrics, 21 unstable metrics. |
1788096 to
55578ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
VMX and VM stack walkers were bypassing remote symbolication by directly returning resolved symbol names. This caused frames to show as 'burn_cpu_recursive' instead of '<build-id>.<remote>(0x<offset>)' format. Extracted resolveNativeFrame() as shared function and added applyRemoteSymbolicationToVMFrames() to post-process VM walker output, converting resolved symbols back to RemoteFrameInfo structures when libraries have build-ids. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replaced malloc() calls with pre-allocated pool to ensure signal handler safety and eliminate memory leaks. Pool uses atomic operations for lock-free allocation across 16 lock-strips (128 entries each, ~48KB total). Also fixed documentation inaccuracies regarding file names, usage examples, and JFR output format based on PR review feedback. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add resolveNativeFrameForWalkVM helper to profiler.h/cpp - Patch walkVM to use remote symbolication at native frame resolution point - Remove broken applyRemoteSymbolicationToVMFrames function - Add lock_index parameter to all walkVM signatures via patching.gradle - Update stackWalker_dd.h wrappers to pass lock_index - Remove dead non-const operator[] from codeCache.h - Add alignment check for ELF program headers in symbols_linux_dd.cpp 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This comment was marked as outdated.
This comment was marked as outdated.
5ee68d5 to
45d0122
Compare
- Document resolveNativeFrame() and resolveNativeFrameForWalkVM() helpers - Add section on upstream stack walker integration via patching.gradle - Update Memory Management section with pre-allocated pool details - Add ELF security details (bounds/alignment checks) - Document walkVM integration at native frame resolution point - Remove LinearAllocator from future enhancements (already using pre-allocated pool) - Update file structure to include all modified files - Clarify stack walker integration architecture 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This comment was marked as outdated.
This comment was marked as outdated.
The RemoteFrameInfo pool counters were being reset on every profiler start, which caused issues when reset=false (profiler restart without clearing traces). Standby traces could still reference RemoteFrameInfo objects, so resetting the pool unconditionally would corrupt those traces. This fix ensures pool counters are reset only when traces are cleared (reset=true), maintaining synchronization between trace storage and pool state: - When reset=true: traces cleared AND pool counters reset - When reset=false: traces kept AND pool counters kept Without this fix: - First test run: works (pool starts at 0) - Subsequent test runs: pool exhausted → fallback to resolved symbols - Production profilers: would fail after restart Changes: - Reset pool counters inside _call_trace_storage.clear() under lock - Initialize counter to 0 only on first-time pool allocation - Update comment to clarify counter is reset when traces are cleared 🤖 Generated with Claude Code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The previous fix (c90644d) only reset the pool when reset=true || _start_time==0. This caused pool exhaustion in parameterized tests where the profiler singleton is reused across multiple test iterations: - Test 1 (vm mode): _start_time==0, pool resets → PASS - Test 2 (vmx mode): _start_time!=0, reset=false → pool NOT reset → FAIL - Test 3+ (fp, dwarf): Same failure Root cause: Pool reset was conditional, but profiler.stop() guarantees ALL standby traces are flushed before start() is called again. Therefore, no traces can reference old RemoteFrameInfo objects, making it safe to ALWAYS reset the pool on start(). This fix adds an unconditional pool reset outside the if(reset || _start_time==0) block, ensuring the pool starts fresh for each profiling session. Also removed debug TEST_LOG statements that were added during investigation. 🤖 Generated with Claude Code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit fixes two issues that prevented remote symbolication from
working correctly with the test library libddproftest:
1. Late library loading: The test library loads via JNI during test
execution, after profiling starts. Enable dlopen hook for remote
symbolication mode to extract build-IDs immediately when libraries
are dynamically loaded.
2. JMC API frame format mismatch: Remote frames store build-ID in
type.name (bare, no suffix) and "<remote>" in method.name, but the
test expected "build-id.<remote>" format. Update test to correctly
parse the actual JFR frame structure.
Changes:
- profiler.cpp: Add updateBuildIds() call in dlopen_hook when remote
symbolication is enabled, enable switchLibraryTrap for remote
symbolication mode, move unpatch_libraries() after JFR serialization
- profiler.h: Increase REMOTE_FRAME_POOL_SIZE to 1024 entries per strip
- libraries.cpp: Remove debug logging
- RemoteSymbolicationTest.java: Fix frame detection logic to check
methodName.equals("<remote>") AND className.equals(testLibBuildId),
refactor to use IMCStackTrace.getFrames() API, clean up debug output
Test results: All cstack modes (vm, vmx, fp, dwarf) now pass with
remote symbolication frames correctly detected.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Refactors updateBuildIds() to use explicit per-library caching, mirroring the _parsed_inodes pattern from symbols_linux.cpp. This provides O(1) amortized performance on subsequent dlopen calls instead of O(N) iteration. Implementation: - Move Linux build-ID extraction to libraries_linux.cpp with static cache - Add libraries_macos.cpp with no-op stub for non-Linux platforms - Track processed libraries via std::unordered_set<const CodeCache*> - Thread-safe with mutex protection Performance: - Before: O(N) iteration per dlopen (100 libs × hasBuildId check) - After: O(N) hash lookups = O(1) amortized (~50-100x improvement) - Memory cost: 8 bytes per library (~800 bytes for 100 libraries) Follows project pattern of platform-specific *_linux.cpp files instead of #ifdef guards in main implementation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The VM and VMX stack walk modes were failing because they attempted to call NativeFunc::mark() on packed remote symbolication data. When frame_bci == BCI_NATIVE_FRAME_REMOTE, the method_id field contains a packed integer (lib_index in upper 20 bits, pc_offset in lower 44 bits), not a string pointer. NativeFunc::mark() does pointer arithmetic assuming the value is a string with metadata stored before it, causing undefined behavior when given a packed integer. Changes: - Add upstream patch to guard NativeFunc::mark() check with frame_bci != BCI_NATIVE_FRAME_REMOTE in stackWalker.cpp - Implement packed data encoding in populateRemoteFrame() and resolveNativeFrameForWalkVM() - Implement unpacking logic in flightRecorder.cpp - Add getLibraryByIndex() helper to libraries.h - Remove ExtendedCallFrame approach in favor of packing into existing ASGCT_CallFrame structure All stack walk modes (vm, vmx, fp, dwarf) now pass tests with remote symbolication enabled. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit simplifies the remote symbolication implementation by: 1. Added RemoteFramePacker utility struct for clean pack/unpack of data - Packs pc_offset (44 bits), mark (3 bits), lib_index (17 bits) - Provides clear API for packing/unpacking remote frame data 2. Removed marked ranges infrastructure for simplicity - Deleted MarkedRange struct and related methods from CodeCache - Removed isMarkedAddress(), sortMarkedRanges(), addMarkedRange() - Reverted to simpler binarySearch + NativeFunc::mark approach - Same O(log n) performance but less code complexity 3. Optimized to eliminate duplicate symbol resolution - populateRemoteFrame() now accepts mark parameter - Avoids duplicate binarySearch() calls (50% reduction) 4. Improved documentation accuracy - Clarified that we defer full symbol resolution, not mark checking - Documented actual performance characteristics - Fixed misleading comments about eliminating symbol resolution The key insight: Since we need binarySearch() to get symbol names for mark checking anyway, there's no performance benefit to maintaining a separate marked ranges index. The simpler approach has the same O(log n) cost but significantly less code complexity. All tests pass including RemoteSymbolicationTest covering all 4 stack walk modes (vm, vmx, fp, dwarf). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit addresses code review findings and adds observability improvements: 1. **Fix type mismatch in getLibraryByIndex** - Changed parameter from uint16_t to uint32_t - Matches lib_index packing width (17 bits = max 131K libraries) - Prevents theoretical overflow for large library counts 2. **Clarify library array stability during serialization** - Added comment explaining lockAll() ensures array stability - Corrected initial misunderstanding about race conditions - No actual race exists - serialization happens with profiling locked 3. **Add metrics for remote symbolication** - REMOTE_SYMBOLICATION_FRAMES: Track frames using remote symbolication - REMOTE_SYMBOLICATION_LIBS_WITH_BUILD_ID: Libraries with extracted build-IDs - REMOTE_SYMBOLICATION_BUILD_ID_CACHE_HITS: Cache efficiency tracking - Enables production monitoring of feature adoption and performance All changes verified with RemoteSymbolicationTest passing. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Updated documentation to accurately describe the packed data approach and recent improvements: 1. **Packed Data Architecture** - Documented RemoteFramePacker utility struct - Explained 64-bit packing: pc_offset (44) | mark (3) | lib_index (17) - Clarified zero memory overhead vs previous pool approach 2. **JFR Serialization Details** - Corrected output format (no parentheses in signature) - Documented unpacking flow with RemoteFramePacker - Explained thread safety with lockAll() held 3. **Performance Characteristics** - Updated with actual performance (identical to traditional) - Documented duplicate lookup elimination optimization - Clarified O(1) build-ID cache efficiency 4. **Observability Metrics** - Added new section documenting 3 tracking counters - REMOTE_SYMBOLICATION_FRAMES: Feature usage - REMOTE_SYMBOLICATION_LIBS_WITH_BUILD_ID: Coverage - REMOTE_SYMBOLICATION_BUILD_ID_CACHE_HITS: Cache efficiency 5. **Implementation Notes** - Expanded thread safety section with lockAll() guarantees - Clarified signal handler safety with packed representation - Added "Design Evolution" explaining simplification 6. **Library Integration** - Documented getLibraryByIndex() with uint32_t parameter - Explained O(1) cache lookup in updateBuildIds() Documentation now accurately reflects the production-ready implementation with packed data, simplified mark checking, and comprehensive metrics. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Consolidates all documentation under the doc/ directory instead of having a separate docs/ directory. This matches the existing pattern where documentation files are in doc/. - Renamed docs/architecture/CallTraceStorage.md to doc/architecture/CallTraceStorage.md - Renamed docs/architecture/TLSContext.md to doc/architecture/TLSContext.md - Renamed docs/architecture/TlsPriming.md to doc/architecture/TlsPriming.md - Removed empty docs/ directory Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Updated references in ddprof-stresstest/README.md to point to the new doc/architecture/ location instead of doc/ directly. - Changed doc/CallTraceStorage.md to doc/architecture/CallTraceStorage.md This completes the consolidation of all documentation under the doc/ directory with proper subdirectory organization. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Rename all documentation files under doc/ to use consistent PascalCase naming convention for improved consistency across the codebase. Changes: - event-type-system.md → EventTypeSystem.md - MODIFIER_ALLOCATION.md → ModifierAllocation.md - profiler-memory-requirements.md → ProfilerMemoryRequirements.md - REMOTE_SYMBOLICATION.md → RemoteSymbolication.md Update references: - README.md: Update link to RemoteSymbolication.md Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Update remotesym argument parsing to match the robust pattern used by other BOOL flags (like mcleanup): - Explicitly handle false values: 'n', 'no', 'f', 'false', '0' - Explicitly handle true values: 'y', 'yes', 't', 'true', '1' - Handle no-value case: 'remotesym' alone enables the feature - Any other value defaults to enabled (consistent with mcleanup) This prevents accepting invalid values like "remotesym=yikes" and makes the behavior consistent with other boolean flags in the codebase. Added comprehensive test coverage: - Test no-value case (should enable) - Test numeric values (0 and 1) - Test "no" and "n" variants (should disable) - Test invalid values (document default behavior) Addresses review comments: - #324 (comment) - #324 (comment) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Address security and robustness issues in ELF build-ID extraction: 1. Integer overflow vulnerability (symbols_linux_dd.cpp) - Replace addition-based bounds check with subtraction pattern - Prevents integer wrapping with malicious n_namesz/n_descsz values - Progressive checks: header size, name size, desc size independently - Uses safe pattern: remaining = note_size - offset - sizeof(header) - Each component verified against remaining before subtraction 2. Test race condition (remotesymbolication_ut.cpp) - Replace fixed path "/tmp/not_an_elf" with mkstemp() - Use unique temporary file: "/tmp/not_an_elf_XXXXXX" - Eliminates race conditions in concurrent test environments - Avoids conflicts when tests run in parallel - Follows best practices for temporary file creation Addresses review comments: - #324 (comment) - #324 (comment) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Use portable PRIxPTR macro instead of %lx for formatting uintptr_t values to ensure correct behavior across all platforms. Changes: - Added #include <inttypes.h> for PRIxPTR macro - Changed format string from "0x%lx" to "0x%" PRIxPTR - Ensures correct formatting on Windows x64 where uintptr_t is unsigned long long, not unsigned long - Maintains compatibility across all platforms (Linux, macOS, Windows) The %lx format specifier assumes uintptr_t == unsigned long, which is not portable. On Windows x64, uintptr_t is unsigned long long, which would cause format string warnings or incorrect output. Addresses review comment: - #324 (comment) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Root Cause Analysis: OpenJ9 uses JVMTI-based CPU profiling when shouldUseAsgct() returns false, falling back to the j9_engine (profiler.cpp:1148-1157). JVMTI's GetAllStackTracesExtended() API only captures Java stack frames and does not include native frames from dynamically loaded libraries. Test Requirement: RemoteSymbolicationTest requires native frames from libddproftest.so to validate remote symbolication. The test calls RemoteSymHelper.burnCpu() and computeFibonacci() which invoke native methods, expecting these frames to appear in samples with build-ID and PC offset information. Observed Failure: Test consistently failed after 10 retry attempts with: - 48-52 samples collected per attempt - 0 frames from libddproftest found - Log shows: [TEST::INFO] J9[cpu]=jvmti - Library loaded successfully with valid build-ID This is a fundamental limitation of JVMTI-based profiling on OpenJ9, not a bug in remote symbolication. The feature works correctly on HotSpot JVMs which use signal-based profiling (perf_events/itimer) that captures both Java and native frames. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- flightRecorder.cpp: Fix incorrect "2025, 2026" to "2026" - arguments.cpp: Add Datadog copyright for modifications Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…dd.cpp Added detailed comments with links to official specifications: - ELF Specification from Linux Foundation - LSB ELF Note Section format - GNU build-id feature documentation - GNU binutils ld --build-id option - readelf(1) manual page Also added inline documentation of ELF note structure (Elf64_Nhdr) with field-by-field explanation and 4-byte alignment requirements per spec. This makes it easier for future developers to understand the build-id extraction implementation and verify correctness against specifications. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
What does this PR do?:
Adds remote symbolication support to the Java profiler, enabling native frames to be symbolicated remotely by backend services instead of locally by the agent.
Motivation:
How it works:
When remote symbolication is enabled (
remotesym=true), the profiler stores build-ID and PC offset information for native frames instead of resolving symbols locally. This raw addressing information is serialized into JFR format and sent to backend services for symbol resolution.Key Components:
Build-ID Extraction (
symbols_linux_dd.h/cpp)_build_id_processedset to prevent redundant extractionPacked Frame Data (
profiler.h/cpp,vmEntry.h)pc_offset (44 bits) | mark (3 bits) | lib_index (17 bits)Stack Walker Integration (
stackWalker_dd.h,gradle/patching.gradle)convertNativeTrace()→populateRemoteFrame()resolveNativeFrameForWalkVM()called during stack walkJFR Serialization (
flightRecorder.cpp/h)<remote>marker0x<offset>)Library Management (
libraries.h/cpp,codeCache.h/cpp)Configuration:
# Enable remote symbolication java -agentpath:libjavaProfiler.so=start,cpu,remotesym=true,file=profile.jfr MyAppObservability:
Three new counters track feature usage:
REMOTE_SYMBOLICATION_FRAMES: Frames using remote symbolicationREMOTE_SYMBOLICATION_LIBS_WITH_BUILD_ID: Libraries with extracted build-IDsREMOTE_SYMBOLICATION_BUILD_ID_CACHE_HITS: Build-ID cache efficiencyThread Safety:
_build_id_lockmutexlockAll()holdPerformance:
Platform Support:
Testing:
Test Coverage:
remotesymbolication_ut.cpp,remoteargs_ut.cppRemoteSymbolicationTest.java(all cstack modes: vm, vmx, fp, dwarf)libddproftest.sowith guaranteed build-idDocumentation:
Backward Compatibility:
For Datadog employees:
credentials of any kind, I've requested a review from
@DataDog/security-design-and-guidance.